Datamarts Interface

Datamarts

Datamarts can be used to upload and manage data in Exasol from Aunsight.

Shifting to Datamarts

Datamart queries were primarily a replacement for the Datalake queries (via Apache Drill) which previously powered the Daybreak UI.There are several reasons for this shift:

  • Datalake queries operate on data at rest in a datalake. At-rest data is rarely optimized for rapid access or processing. Datalake queries are useful for bulk processing in a solution pipeline or occasional ad-hoc analysis which can afford to wait 10s of seconds (at minimum) for results. Datamart queries are typically sub-second for a much better interactive user experience.
  • Datamart queries are optimized for rapid, interactive querying. The tradeoff is that they must be explicitly provisioned & data must be refreshed via a load.
  • While Apache Drill can be used by BI tools, an analytical database such as Exasol is a much more appropriate mechanism to deliver query-able data for client direct use
  • While Apache Drill does support joins and such, a stateful database will be much more efficient since it can build indexes and learn from the query patterns
  • It was always the intention of the Aunsight platform team to adopt a “real” database eventually. Integrating Exasol was the fulfillment of that plan.
  • Exasol has a more appropriate concurrency and performance model for interactive use cases. Drill can easily start queuing under load.

Concepts

  • Datamart - A collection of tables in the query acceleration layer which are intended for interactive analysis
  • Datalake - A storage location optimized for retention rather than rapid access
  • Query Acceleration Layer - A system layer which contains databases optimized for interactive querying of data
  • Exasol - An analytical database optimized for rapid, interactive querying
  • Migration - The process of transforming a datamart from one schema to the next via a series of specified actions
  • Migration Action - An atomic step in a migration plan indicating some specific transformation to perform on the datamart
  • Migration Plan - A series of migration actions to be run for a migration
  • Dataset - Managed data in an Aunsight datalake described by an Atlas record
  • DSV - A text-based, Delimiter-Separated-Value dataset format. Common delimiters include comma, pipe and tab.
  • Parquet - A binary-based dataset format organized in a columnar fashion. It is optimized for machine rather than human consumption. Aunsight support was added relatively recently, and adoption/support is increasing.
  • Native View - A stored query in a database which is presented as a virtual table. The view does not contain any stored data, but rather lays on top of other tables/views. Views can reduce the storage footprint while tailoring data access. There may be management and performance implications depending on the query complexity.
  • Datamart Key - A unique key within an Exasol schema that identifies tables managed by a datamart in a more human-friendly manner rather than using the datamart id
  • Exasol Schema - A table/view namespace within Exasol

Provisioning/Migration

Datamart provisioning involves using migrations to transform tables and schema. Migrations consist of actions like creating, dropping, and altering tables, views, and columns. A migration plan can be generated to determine actions needed for transformation. As solutions grow, adjusting datamart structure while preserving data becomes necessary. Familiarity with migrations is important for solution teams. After testing, migrations can be added to version control for easy deployment. Each datamart in Exasol must have a unique key. Migrations are run as transactions, ensuring success for all actions. Multiple small migrations are used during development, while production deployment involves a selected set. External BI tools require correct key referencing.

For more infromation on migration generation Click Here

For more infromation on the different migration actions Click Here